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Abstract 

Multigrid convergence rates degenerate on problems with stretched grids or 
anisotropic operators, unless one uses line or plane relaxation. For three dimen- 
sional problems, only plane relaxation suffices, in general. While line and plane 
relaxation algorithms are efficient on sequential machines, they are quite awk- 
ward and inefficient on parallel machines. This paper presents a new multigrid 
algorithm, based on the use of multiple coarse grids, that eliminates the need for 
line or plane relaxation in anisotropic problems. We develop this algorithm, and 
extend the standard multigrid theory to establish rapid convergence for this class 
of algorithms. The new algorithm uses only point relaxation, allowing easy and 
efficient parallel implementation, yet achieves robustness and convergence rates 
comparable to line and plane relaxation multigrid algorithms. 

The algorithm described here is a variant of Mulder’s multigrid algorithm 
[5] for hyperbolic problems. The latter uses multiple coarse grids to achieve 
robustness, but is unsuitable for elliptic problems, since its V-cycle convergence 
rate goes to one as the number of levels increases. The new algorithm combines 
the contributions from the multiple coarse grids via a local switch, based on the 
strength of the discrete operator in each coordinate direction. This improvement 
allows us to show that the V-cycle convergence rate is uniformly bounded away 
from one, on model anisotropic problems. Moreover, the new algorithm can be 
combined with the idea of concurrent iteration on all multigrid levels to yield a 
highly parallel algorithm for strongly anisotropic problems. 


*This research was partially supported by the National Aeronautics and Space Administration under 
NASA Contract No. NAS1-18605 while the authors were in residence at ICASE, NASA Langley Research 
Center, Hampton, VA 23665. 
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1 Introduction 


As is well known, the convergence rate of multigrid algorithms based on point relaxation 
smoothers degenerates on problems exhibiting strong anisotropies. Thus line or plane re- 
laxation in each of the coordinate directions is often needed to obtain good multigrid con- 
vergence rates. Anisotropic discrete operators arise in problems in which the differential 
operator exhibits stronger coupling in some coordinate directions than in others, or when 
the discretization is based on highly stretched grids having mesh aspect ratios far from unity. 
For such problems, line relaxation typically suffices in two dimensional problems, while for 
three dimensional problems plane relaxation is often required. With standard (full coarsen- 
ing) multigrid in three dimensions, plane relaxation in each of the coordinate directions is 
required in general [6]. 

Plane relaxation is very expensive, especially for systems of equations. Given this, interest 
has focused recently on “semicoarsening” algorithms, in which the plane relaxation is carried 
out in only one direction, while the grid is “coarsened” only in the direction orthogonal to 
these planes [1]. The required plane solves can then be done recursively, via an analogous 
two dimensional algorithm, based on line relaxation in one direction and coarsening in the 
orthogonal direction. 

While such “semicoarsening” algorithms are fast and effective on sequential architectures, 
they have limited and awkward parallelism. The recursive solution of two dimensional prob- 
lems, required in the plane relaxation, takes 0( log 2 N) parallel operations, so that it takes 
0(log 3 N ) parallel operations per three dimensional V-cycle, where N is the number of mesh 
points. Thus it takes time at least O(log 4 N ) to converge to truncation error. 

An alternative way of achieving robustness, which avoids line and plane relaxations al- 
together, is to use multiple coarse grids formed by semicoarsening in each of the coordinate 
directions, as in Mulder’s hyperbolic algorithms [5]. In this paper, a modification of Mulder’s 
method is proposed, which substantially improves the convergence properties of his method, 
when applied to elliptic problems. This enables one to design effective V-cycle elliptic solvers 
for anisotropic problems, using only point relaxation smoothers. This new class of methods 
is shown to be simple, robust, and effective. 

One can also construct a highly parallel algorithm for anisotropic problems by combining 
the new algorithm with the idea of concurrent iteration on all multigrid levels [2]. This paper 
gives numerical experiments suggesting the efficacy of this approach, though we have yet to 
explore the use of this algorithm on parallel machines. 

2 Algorithm Design 

Strong coupling of the operator in a particular direction can easily degrade the performance 
of a multigrid method. There are several ways of accelerating convergence in the case of such 

anisotropy. Line relaxation and semicoarsening methods can be used to correct, respectively, 
the inability of the relaxation method to solve for some high frequencies, and the inability 
of the standard coarse grid to represent high frequencies. In the line relaxation method, the 
ineffective point relaxation is replaced by line relaxation in the direction of strong coupling. 
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By removing the residual components due to the strong coupling, the remaining residual 
due to weak coupling in the other direction can be effectively smoothed by the relaxation. 
Thus line relaxation together with standard coarsening is sufficient to uniformly reduce all 
Fourier components, in two dimensional problems. In the second approach, instead of using 
line relaxations, the grid is coarsened only in the direction of strongest coupling. In this 
case, point relaxation together with semicoarsening suffices to uniformly reduce all Fourier 
components. 

In addition to line relaxation and semicoarsening, other methods have been proposed that 
more aggressively solve for the difficult frequencies. Hackbusch’s Robust Parallel Multigrid 
[3] uses ‘forced aliasing’ to represent high frequency components on standard coarse grids. 
The high frequency components of the residual are aliased to low frequencies, solved for on 
a coarse grid, and then the coarse grid correction is “de-aliased” back to the high frequency. 
Although this method uses point relaxations and standard coarsening, it requires the use of 
multiple coarse grids, each with a different discrete operator, and is thus quite complex. 

In this paper we will look at a natural extension of the second approach, use of semi- 
coarsening. This approach was originally proposed by Mulder [5] for overcoming the problem 
of alignment in fluid flow computations. The simple technique of semicoarsening simulta- 
neously in all coordinate directions, and properly weighting the contributions from each of 
the coarse grids, yields an efficient, robust, and easily parallelizable multigrid method for 
general tensor product grid. 


3 The Algorithm 


The multiple semicoarse grid (MSG) correction scheme (for linear problems) is similar to 
the standard multigrid correction scheme, except that there are now extra grids involved. In 
two dimensions every grid is simultaneously coarsened in two directions. 

We first suppose, for simplicity, that the domain of the model boundary problem is the 
unit square, and that this problem is to be solved on an N x N uniform grid given by 

n h = {(ihjh) I i = 0, 1 , . . . , iV — 1 ; j = 0, 1 , . . . , N - 1 }, 

where h = 1/N and N is a power of two. Let the subgrid, Q m,n , obtained by successively 
semi coarsening Q-h, be the grid with N/2 m grid points in the x direction and Nj 2" grid 
points in the y direction. 

Notice that the notation does not distinguish between a grid obtained by semicoarsening 
first in the y direction and then in the x direction and a grid obtained by semicoarsening first 
in the x direction and then in the y direction. As shown in Mulder, in order to construct 
reasonable algorithms in three or more dimensions, the problems on equivalent grids must be 
combined. Figure 1 shows the interrelations between the various grids for a two dimensional 
problem with an 8 x 8 fine grid. With coarse grids combined as in this diagram, one has a 
only 16 grids altogether, while without combining the full binary tree of grids would contain 
69 grids and have no real numerical advantage. 

Now introducing more notation, the discrete equations on grid fl m,n are written as: 



2 


HiMii.iiiutiiiiiiiiii | h Ml ^Iin^i,iiiiuwitf l | mu, Ml I Mill || In mult 



The operators A m ' n can be thought of as either discretizations of the differential operator, 
L, on the grid Q m,n , or as operators obtained variationally from the fine grid and intergrid 
transfer operators. 

A k grid (N = 2 k ) V-cycle for this method is performed in three parts. In part a the 
information is propagated from the fine grids to the coarse grids, in part b the equations are 
solved on the coarsest grid, and in part c the information is propagated back from the coarse 
grids to the fine grids. 

MSG algorithm 

a. For l = 0, 1, 2, . . . , 2k — 1: 

For all m > 0, n > 0 such that m + n = /: 

1. If / > 0, combine restricted residuals on f l m,n 

2. Relax Vy times on the fl m ’ 7l -grid equations 

3. If m < k, transfer (restrict) residual from fi m,n to 

4. If n < k, transfer (restrict) residual from to fl” l,n+1 

b. For l = 2k: 

1. Combine restricted residuals on Q, k,k 

2. Solve (1) by any direct or iterative method on 

3. Transfer (interpolate) correction from Q, k,k to Q, k ~ 1,k and Q k,k 1 

c. For / = 2k — 1,2k — 2, . . . , 1, 0: 

For all m > 0, n > 0 such that m + n = /: 

1. If l > 0, combine interpolated corrections on f! m,n 

2. Relax z /2 times on the fl m,n -grid equations 

3. If m > 0, transfer (interpolate) correction from fl m,n to 

4. If n > 0, transfer (interpolate) correction from Q m,n to 


Any point relaxation on equation (1) can be used in steps a2 and c2. For steps a3, a4, 
c3, and c4 we consider intergrid transfer operators which are one dimensional. For example, 
the residual restriction operators could be the usual three point averaging formulas, given 
by the stencils: 


rm + l,n 
1 m,n 


7m,n-|-l 

1 m,n 



1 

2 


1 

4 


m + 1 ,n 
m,n 


n 




1 

2 


u 


-* m,n 
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Figure 1: Semicoarsening of an 8 x 8 grid 
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Similarly, one dimensional linear interpolation in the direction of grid refinement could be 
used to bring the corrections from coarse to fine grids. 

We now look at the details of step al, in which the residuals on f l m,n must be combined, 
and step cl, in which the corrections on f 'l m,n must be combined. The restricted residuals 
are simply averaged. Specifically, 


' §(ClV m " 1,n + C£- 1 r m '"- 1 ) if m > 0, n > 0, 

if n = 0, 
if m = 0. 


= < 


rm,n m-l,n 


( 2 ) 


I , r m,n_1 

V m,n- 1 


A weighted average of the interpolated corrections is used, so that 


f uh m,n C+ n i,n« TO+1 ’ n + ^ 2 m ’ n C:r+i« m ’ n+1 if m < *, n < k. 


u 


771,71 


< 


-. 771 + 1,71 

J m+l,n W 


if n = k, 


jm, n 

l -*771,71 + 1 


^ 171 , 71+1 


if m = k. 


The MSG algorithm can lead to multigrid convergence rates independent of mesh size, 
provided the weights ui and u 2 are chosen properly. 
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4 Convergence Theory 

In this section, we give our convergence proof for the MSG method for a constant coefficient 
model problem in two dimensions, and derive sufficient conditions on the weights. These 
conditions are then used in the next section to motivate the choice of weights for the variable 
coefficient problem. 

In order to show that the convergence rate of the MSG method is independent of mesh 
size for linear, constant coefficient model problems, we make the following assumptions: 


Al. The coarse grid operators are Galerkin, or ‘variational’. 

A2. The restriction and projection operators are adjoints of each other and are one 

dimensional. 

A3. The discretized operator is symmetric positive definite. 

A4. The linear part of the smoother and the discrete operator commute. 

We model both our analysis and our notation after that in [4], although some notational 
changes are needed in order to keep track of the multiple grids on each level. In particular, 
it is more convenient here to label grid levels in the reverse order, so that the grid level 
increases as the grid becomes coarser, contrary to the standard convention. If the two grid 
directions are to be coarsened a maximum of m times in the first direction and n times in 
the second direction (0 < m < fh and 0 < n < h) then the coarsest level will be given by 
/ = m -)- n and the coarsest grid will be given by the indices m, h. 

We are looking for the solution of 

A°'V'° = /°-° 

where A 0,0 is symmetric positive definite. For each coarser grid level, / = 1, . . . , /, we recur- 
sively define each of the operators A m,n , for m + n = /, 


A m,n 


jttl f Ti rm — l,n 

J m,n 


rm,n a m,n— 1 rm,n- 1 
v *m y n— l'* 1 m,n 


if m > 0 


if m = 0 


Note that if m and n are both positive then there are two ways to construct the coarse 
grid operators from the fine grid operator. However, since the intergrid operators are one- 
dimensional, 

7 771,71 jm,n — 1 jm,n tto — l,n /o\ 

771 , 71 — 1 771 — 1 ,71 — 1 ^ 771 - 1 , 71 -' 771 - 1 , 71—1 7 \° ) 

and therefore either way gives the same result. 

Our notation is as follows. Each of the A m,n are operators on a finite dimensional space, 
H m,n . Since we will only be looking at two grid levels at a time, we simplify the grid indices 
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by the shorthand notation: 


m, n 

771 + 1, 71 
m , 7i 4- 1 


k = 

= 

*2 = 

We also define the inner products 

(u k ,v k ) k = 

i 

[«*,»*]* = (A k u k , v k )k 

for any u k ,v k in H k . The second, ‘energy 7 , inner product induces a norm on H k , which we 
denote by || . ||*, thus 

Il4l* = [44* = (A k v k , v k ) k . 

We also define four subspaces of each space H k as follows. For i = 1,2: 

S< = 

T? = = 0 for all ti>* € £*} 

Corresponding to these subspaces, we define projection operators, T k and S k such that 

R{T k ) = Tf 
ker(T k ) = 

- I-T k . 

for i = 1,2. 

With the above notation, we are now ready to discuss the V-cycle convergence analysis. 
The approximation to the solution of the kth. grid equations is updated three times per V- 
cycle; once after the v x relaxation sweeps in step A2, once after the coarse grid correction in 
step Cl and once after the v 2 relaxation sweeps in step c2. We label the initial approximation 
as u k qj, and the approximations after each of the three updates as for i = 1,2 and 3. If 
Q denotes the relaxation operator, then the updates are given by 

4) = ^(4),/*) 

«(2) = “(.) + «?/£<$,+<•>}/£■<& 

«(3) = (T( »(2),/‘) 

where 

U ‘5, = MG(u % Y ^), j = 1,2. 

We make two additional assumptions in order to simplify the case when two grids on the 
same grid level are semicoarsened in opposite directions, yielding coarse grids of the same 
dimensions. Recall that the coarse grid problems on these coarse grids are combined to form 
a single problem. 
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A5. The initial approximation is equal to zero on all except the finest level: 

u k Q ^ = 0, for all k - m,n ± 0,0 

A6. There is no smoothing in the fine-to-coarse part of the V-cycle. 


v\ = 0 

These assumptions guarantee that the residuals from both grids are identical, as shown in 
the following lemma. 


Lemma 1 For m,n with 0 < m < m, 0 < n < n: 


rm,n _ jm,n rm- l,n _ r m . n f 
J l,nJ m ,n— 1 J 


m»n— 1 


Proof: 

The lemma is proved by a simple induction argument on the grid level /, using the 
additional assumptions A5 and A6, together with Equations (2) and (3). ® 


A standard multigrid V-cycle convergence result can be based on a single assumption, 
which combines both the smoothing and the approximation properties of the problem, 


namely 



> 1+0 


( \\T k F k v\\ 

V li^MI l 


[ ) 


i/“ 


for all v E H k and for all grid levels k. See [4]. Here F k is the linear part of the smoothing 
operator Q V2 . We assume sufficient regularity and take a = 1. Then if we define the multigrid 
convergence rate on the kth grid level as 


e k = inf{||u fc - MG k {v k ,f k )\\ k < e||w fc - v k \\ k for all v k E H k }, 


the V-cycle convergence theorem for standard multigrid algorithms is 

Theorem 1 (Standard) Let k > 2 and suppose we have already bounded e k ~ l 
Then 


by 


1 

1 + 0 ’ 


In our case, in which we have multiple coarse grids on each grid level, we can prove a 
similar result. In fact, the convergence of the MSG V-cycle algorithm can be as good as the 
convergence of a standard V-cycle multigrid in which evety grid is semicoarsened only in the 
optimal direction. 
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Theorem 2 (MSG) Suppose that 

Nit > ll^*»ll2 + A-||i?r‘»||J, * = 1 , 2 , (4) 

for all v G H k and for all grid levels k, and choose the weights, w k andu%, so that 

Pi 


u> k = 


If 


then 


Proof: 

First we note that 


Pi + P 2 

rnax{e m+1 ’ n ,£ m ’ n+l ) < min ( — ) 

Vi + A 1 + P2) 

k . . ( l 1 \ 

V 1 + A 1 + P 2 J 


UP l + U?2 — 1. 


(5) 


We denote the errors corresponding to the updates by for z = 0,1,2 and 3, where 


e <0 = uk ~ «(,) 


and where u k is the exact solution of the fcth grid equations. Using Lemma 1 we see that, 
for both i = 1 and i = 2, 

e * _ 7* — T k p k 

e (i) 1 k i u — J i e (i)- 

By our assumptions we also have 

e (i) = ef 0) . (6) 

Combining these results, we can write 


e (2) — ( W 1 T\ + + W X^Jfc, e ( 3 ) + w 2-?* 2 C( 3).- 

Then the error before and after the i >2 post-relaxation sweeps is 


e k 

e (3) 


F k f k 

t e (2) . 


Since we need to look at only two grid levels at a time, we will temporarily suppress the 
notational references to the current grid, k. Thus, we define, for i = 1,2 and j = 0, 1,2, 3, 

u k ' = u\ 


hi _ 


e 

e k - 
e (j) “ 


= e 


e U)i 


and so on. 
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Consider an arbitrary element, v, of H k , and let y — Fv. Then 


[e(3),w]fe = t e (2 ),y] 


— [(c^lTi + W 2 T 2 )e(o) + ^l/j^ e (3) + U} 2^k, e i3)i y] 

= wi ([Tie ( i), T\y\ + [/*, e| 3) , Siy]) + u 2 ([T’ 2 e( 1 ), T 2 y] + [^jt 2 e (3) ? s 2 y]) 

By the Cauchy-Schwartz inequality, 

I [ e ( 3 ), u ]| < (||T’ 1 e (1 )|| ||Tiy|| + £1 ||5'ie (1 )|| ||Sij/||) + u 2 (||T , 2 e (1 )|| \\T 2 y\\ + e 2 || \\S 2 y\\) , 

since, for i = 1,2, 

11^(3)11 = 114)11 < e «ll e (o)ll = e «ll u il = £.Pfc.«il = *ll£*(i)ll- 
Using the Cauchy-Schwartz inequality one more time gives us 

l[e(3),v]| 2 < (Wir' 1 =( l )l| I + l|5ie(i)|| 2 ) +W2(|jr 2 e(i)|| 2 + ||5’ 2 C(i)|| 2 )) 

• (^(IIT^II 2 + £,||S,»|| 2 ) +o. 2 (||J’ 2! ,|| 2 + e 2 ||5 2 j/|| 2 )) . 

Dividing through by the square norms of the initial error and v and using Equations (5) and 
(6) this simplifies to 

|[ e ( 3 )> u ]| 2 ||y|| 2 4 1 (||T 1 y|| 2 + £ 1 ||5 1 y|| 2 ) + u )2 (||T 2 y|| 2 +£ 2 ||5 2 y|| 2 )\ 

l|e ( 0 )|| 2 NI 2 - INI 2 1 IMI 2 / ' 

It is convenient to define two variables, t\ and t 2 , such that 


h = 



= M! 

IIj/II 2 


and rewrite our inequality as 


|[ g ( 3 ), ^]| 2 < 

ilerfIMI 2 ” 


~jT tJ~ + £l(l — *l)) + + £ 2(1 — ^ 2 )) • 


(7) 


The smoothing and approximation hypotheses given by Equation (4) of the theorem can 
then be rewritten in terms of the new variables as, 


_M1 

ll^ll 2 

1 HI 2 

||Fu || 2 


> 1 + /?Ui, 

^ 1 + /?2^2* 
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Figure 2 : Limits on 



We can therefore write inequality ( 7 ) in terms of t\ and t 2 with either of these upper bounds 


on 


so we are free to use whichever is smaller. Thus, 


l[e( 3 )>v]| 


2 <■ f 1 
< min 


1 


+ £i(l — t\)) + (^2(^2 + £2(1 — ^2))) ■ (8) 


l e (0)!| 2 || t> || 2 ' V 1 1 + 

Taking the maximum over all values of 1 1 and t 2 we arrive at the following bound on the 
convergence rate, 


e k < max ( min 

0 < h < 1 V 

0 < t 2 < 1 


TTKil' TTM) ((lJl(!l + e,(1 _ (l)) + + £2(1 ~ ' 2))) ) • 


Using the definitions of the weights uj\ and u > 2 and the conditions on the £,-’s, it follows that 

e k < min | 7 — — — , 7 — I . I 


k 1 + Pi 1 + @2 ) 


Note that the weights in the hypothesis of the theorem, while convenient, are not the 
only choice. All we really needed in the proof was that the weights lie within some bounds, 
determined by the /Vs. If we define the ratio, 77 = then the theorem will be proved 
provided lo\ < tj and lo 2 > 1 — 77 whenever 77 < 1 and > 1 — I/77 and u ? 2 < I/77 whenever 
77 > 1 . For the statement of the theorem, we have chosen u\ = 77/(77 + 1 ). See Figure 2. 
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5 Practical choice of the weights 


Suppose that, as in Mulder, the corrections from the two coarse grids are averaged. Thus, 
the weights are given by 


These weights give a two-grid convergence rate of approximately 1/2 in the case of strong 
alignment, because the appropriate grid gets only half of the needed information. In this 
case, our convergence result does not guarantee good convergence. Thus we should look for 
ways, based on our theorem, to improve the convergence of this method. 

Recall that only semicoarsening in the direction of strongest coupling can significantly 
reduce the frequencies which cannot be reduced by the point relaxation. The weights provide 
a way of “switching” to the appropriate coarse grid in the cases of strong alignment in one 
of the coordinate directions. For our model problem, olu xx + 7 u i/y = /> we have some degree 
of freedom in the choice of the weights. We could take, for instance, 


or 


wi = 


a 2 + 7 2 


U >2 = 


a 2 + 7 2 


Since the appropriate grid gets all of the needed information in the case of strong alignment, 
these weights can lead to convergence rates which can be made arbitrarily small by increasing 
the number of relaxation sweeps. 

In general, and lo 2 will vary over the domain and we will not know the relative strengths 
a and 7 explicitly. Suppose we know that, locally, all modes which cannot be efficiently 
reduced by point relaxation can be well approximated on the same semicoarsened coarse 
grid. That is, suppose semicoarsening can be used locally to accelerate the convergence. 
In this case, we would like to determine the most efficient direction of semicoarsening and 
choose our weights accordingly. One way to do this is to test the operator, at the given grid 
point, on two different high frequency Fourier modes, one oscillatory only in the x-direction, 
the other oscillatory only in the y-direction. The two modes, call them u and u, which are 
most natural look locally like: 


1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 

1 

-1 


u 


Appropriate weights at the grid point (i, j) 


-1 -1 -1 -1 -1 -1 

111111 

-1 -1 -1 -1 -1 -1 

111111 

-1 -1 -1 -1 -1 -1 

111111 

v 

be determined by applying the operator, 
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A m,n to u and v. We define 


(A m ’ n u)(i,j) = \ u (i,j) and (A m,n u)(i, j) = A„(i, j). 
Then a reasonable choice for the weights is: 


m,n / ■ *\ 


\l 


a?. + a r 


m,n / « 

^2 (hJ) = 


A?. 


A’ + A* 


Thus, if there is a direction in which semicoarsening can give an acceptable convergence rate, 
this method should find that direction. 


6 Theoretical Complexity - Sequential and Parallel 

As shown in Mulder [5], the cost of a sequential MSG V-cycle is proportional to the total 
number of points on all grids. In two dimensions, where the fine grid consists of M x N 
points, there is a total of (2 M — l)(2N — 1) grid points on all of the grids combined, as can 
be easily seen by arranging all of the grids as in Figure 3. Thus, there are approximately four 
times as many points on all the grids as there are on the finest grid. A similar arrangement 
of all of the grids obtained by semicoarsening a three dimensional L x M x N grid is also 
shown in Figure 3 giving a total of (2 L — 1)(2M — 1)(2AT — 1) grid points on all of the 
grids, approximately 8 times the number of points on the finest grid. The cost of the 
d dimensional algorithm, will be roughly proportional to 2 d times the number of points on 
the finest grid. A parallel implementation of the MSG algorithm is relatively straightforward 
since the computational work on a given level is local and can be performed simultaneously 
at many grids points. For a modest number of processors, most of the computation time is 
spent on the fine grid levels since, on each fine grid level, l, there are 


l+d-l \ 
d~ 1 


MN 

2 l 


l d ~ 1 

(d - l)\2‘ 


MN 


grid points per level. On coarser levels this is an upper bound on the number of grid points. 
Therefore the number of grid points decreases like a polynomial divided by an exponential 
as the levels become coarser. For a large number of processors, approximately equal to the 
number of grid points on the finest grid level N, an equal amount of parallel computation 
time is spent on all grid levels, resulting in a computational cost per V-cycle on the order of 
log(N). 

On message passing machines, the communication between grid levels can become a 
problem as the number of processors is increased. Consider what happens in the extreme 
case where we have as many processors available as we have grid points on the finest grid. 
If we assign one grid point to each processor, then some processors have more work on the 
coarser levels and some processors will have no work, simply because the multiple semicoarse 
grids have some grid points in common. Thus, some sort of re-distribution must occur in 
order to keep the load balanced. We propose two different schemes, a simple scheme involving 
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Figure 3: Successive semicoarsening, total number of points 




transposes, which works only in two dimensions, and a second scheme which offsets the grids 
in order to reduce the communication. Both schemes preserve the computational complexity, 
but have differing communication requirements. The offsetting scheme is ideal for hypercube 
communication networks, since all communication is between nearest neighboring processors. 

The transpose scheme is based on the observation that in two dimensions, even if the 
grids of the same dimensions are not combined, the total number of points on each level does 
not increase. For example, if we start with an 8 x 8 grid on level zero, we get a 4 x 8 and an 
8x4 grid on level one, still having 64 mesh points. On level two, we get a 2 x 8, two 4x4, 
and an 8 x 2 grid. By carefully transposing and packing these grids, we can fit all 2 l grids 
on level l into a 8 x 8 array, as shown in Figure 4. This mapping does keep all calculations 
on a particular grid local, but involves packing and shifting between grid levels. 

In higher dimensions, this type of packing of the coarse grids does not work. Moreover, 
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Figure 4: Transpose scheme 



this scheme involves intense communication in the packing phase which could make it pro- 
hibitively expensive on some machines. For hypercube communication networks, there are 
alternate schemes which keep all of the grid data local. One possibility is to offset the various 
coarse grids to redistribute the load. For example, in two dimensions, using N 2 points and 
( 2N — 1 ) 2 processors, there is a simple one-to-one mapping from all of the points on all of 
the grids to the (2N — l) 2 processors. For every grid level l (0 < l < k), and for every grid 
n m,n on level / (m > 0, n > 0, m + n = /), let the (i,j)th grid point on grid fl m,n be assigned 
to the (j,J) processor where 

i = 2 m+1 i + 2 m - 1 

3 = 2 n+1 i + 2" - 1, 

where 0 < i < N/2 m and 0 < j < Nf 2 m . The quantities 2 m — 1 and 2 n — 1 in the above 
expressions are the horizontal and vertical offsets for the Q m ’ n grid. Since information is only 
transferred to grids differing by one in either m or n, then the relative offsets are always by 
a power of 2 in one direction. This scheme works equally well in three or more dimensions 
and the extension is obvious. 

Finally, we note that the offsetting of the various grids can easily be incorporated into 
the interpolation and restriction operations. During a restriction from f l m ’ n to for 

example, an averaging of three values of the residual in the x direction is immediately followed 
by a shift of all values to the processor which is 2 m to the right. Similarly for the coarsening 
in the y direction. During the interpolation, this process is reversed. Interpolated values 
are shifted to the left. This method automatically maintains the correct offset for all of the 
grids and increases the communication by only a constant. 

Note that, when relaxations are only performed on one grid level at a time, it is sufficient 
to offset the grids in only one of the two coordinate directions, using only N(2N — 1) 
processors. 

7 Experimental Results 

Experimentally, the MSG algorithm converges extremely well for the model problem au xx + 
7 u yy = f i using the weights suggested in Section 5. Asymptotic convergence rates are given 
in Table 1. These were obtained using a random initial approximation to the solution, 
rescaling after each iteration and observing the limit of the ratio of subsequent errors (/ 2 ). 
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All available coarse grid levels were used, with two red/black SOR relaxation sweeps per 
grid level and an exact solve on the coarsest level. The convergence rates are seen to be 
uniformly small for all ratios, a/ 7 . 

MSG was also used on Poisson’s equation on non-uniformly stretched grids. Chebyshev 
grids were used in both directions, with convergence comparable to the uniformly stretched 
grids. See Table 1. The convergence rates for grids which have Chebyshev stretching in 
only one direction are also given and can be seen to be in the same range as for the model 
problem. 

The last entry in Table 1 is for exponential stretching of the grid in one of the coordinate 
directions. The exponential stretching is done so that the ratio of the lengths of the first 
and the last cell is 10,000. The convergence rates are slightly worse, but still appear to be 
bounded independently of grid size. 



8 x 8 

16 x 16 

32 x 32 

64 x 64 

Uniform Grid 





a/7 = 1 

0.07 

0.09 

0.10 

0.10 

O 

t — H 

II 

*- 

lr 

0.13 

0.15 

0.15 

0.15 

a/7 = 100 

0.16 

0.19 

0.19 

0.19 

a/7 = 1000 

0.16 

0.19 

0.21 

0.21 

Chebyschev Grid 

0.14 

0.15 

0.16 

0.16 

U niform/ Chebyshev 

0.09 

0.13 

0.15 

0.16 

Exponential Stretching 

0.15 

0.18 

0.19 

0.20 


Table 1: Asymptotic convergence rates of MSG on various types of grids 


On massively parallel architectures, the relaxation sweeps in the MSG algorithm can be 
performed concurrently on all grids on all grid levels using the CMG algorithm of Gannon 
and Van Rosendale [ 2 ]. The combined CMG/MSG algorithm proceeds in two phases. In the 
first phase, the relaxation is performed on all grids, on all levels. The second phase is the 
intergrid transfer phase, in which residuals and corrections from each grid are transferred to 
neighboring coarse and fine grids, respectively. Experimental results indicate that the ro- 
bustness properties of the MSG algorithm are retained. In Table 2, the observed convergence 
rates are given for the model problem. Note that we again observe that strong alignment 
does not seriously degrade the convergence. The convergence rates per concurrent iteration 
are mostly in the 0.4-0 . 6 range. 
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8 x 8 

16 x 16 

32 x 32 

64 x 64 

Uniform Grid 

a h = i 

0.42 

0.52 

0.59 

0.60 

a /7 = 10 

0.46 

0.48 

0.59 

0.61 

a /7 = 100 

0.40 

0.43 

0.52 

0.57 

a /7 = 1000 

0.41 

0.43 

0.51 

0.55 

Chebyschev Grid 

0.37 

0.44 

0.50 

0.53 

Uniform/Chebyshev 

0.44 

0.53 

0.56 

0.57 

Exponential Stretching 

0.48 

0.59 

0.63 

0.63 


Table 2: Observed convergence rates of MSG/CMG on various types of grids 
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